Optimal Testing for Crowd Workers
نویسندگان
چکیده
Requesters on crowdsourcing platforms, such as Amazon Mechanical Turk, routinely insert gold questions to verify that a worker is diligent and is providing high-quality answers. However, there is no clear understanding of when and how many gold questions to insert. Typically, requesters mix a flat 10–30% of gold questions into the task stream of every worker. This static policy is arbitrary and wastes valuable budget — the exact percentage is often chosen with little experimentation, and, more importantly, it does not adapt to individual workers, the current mixture of spamming vs. diligent workers, or the number of tasks workers perform before quitting. We formulate the problem of balancing between (1) testing workers to determine their accuracy and (2) actually getting work performed as a partially-observable Markov decision process (POMDP) and apply reinforcement learning to dynamically calculate the best policy. Evaluations on both synthetic data and with real Mechanical Turk workers show that our agent learns adaptive testing policies that produce up to 111% more reward than the non-adaptive policies used by most requesters. Furthermore, our method is fully automated, easy to apply, and runs mostly out of the box.
منابع مشابه
Multi-Objective Crowd Worker Selection in Crowdsourced Testing
Crowdsourced testing is an emerging trend in software testing, which relies on crowd workers to accomplish test tasks. Typically, a crowdsourced testing task aims to detect as many bugs as possible within a limited budget. For a specific test task, not all crowd workers are qualified to perform it, and different test tasks require crowd workers to have different experiences, domain knowledge, e...
متن کاملUnderstanding Job Satisfaction of Crowd Workers: An Empirical Analysis of Its Determinants and Effects
Crowd work has emerged as new pattern of digitally mediated collaboration. In this paper, we focus on the determinants and effects of crowd workers’ job satisfaction – a perspective that has been largely neglected by current crowdsourcing research. We report results from a survey of 161 crowd workers participating in crowdsourced software testing. Our research shows that job satisfaction mediat...
متن کاملThe Communication Network Within the Crowd
Since its inception, crowdsourcing has been considered a black-box approach to solicit labor from a crowd of workers. Furthermore, the “crowd” has been viewed as a group of independent workers dispersed all over the world. Recent studies based on in-person interviews have opened up the black box and shown that the crowd is not a collection of independent workers, but instead that workers commun...
متن کاملC2A: Crowd consensus analytics for virtual colonoscopy
We present a medical crowdsourcing visual analytics platform called C2A to visualize, classify and filter crowdsourced clinical data. More specifically, C2A is used to build consensus on a clinical diagnosis by visualizing crowd responses and filtering out anomalous activity. Crowdsourcing medical applications have recently shown promise where the non-expert users (the crowd) were able to achie...
متن کاملAutomated Support for Collective Memory of Conversational Interactions
Maintaining consistency is a difficult challenge in crowd-powered systems in which constituent crowd workers may change over time. We discuss an initial outline for Chorus:Mnemonic, a system that augments the crowd’s collective memory of a conversation by automatically recovering past knowledge based on topic, allowing the system to support consistent multi-session interactions. We present the ...
متن کامل